Efficient Asymptotic Approximation in Temporal Difference Learning

نویسندگان

  • Frédérick Garcia
  • Florent Serre
چکیده

in Temporal Difference Learning Frédérick Garcia and Florent Serre Abstract. TD( ) is an algorithm that learns the value function associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD( ) with accumulating eligibility trace, called ATD( ). We then use the Ordinary Differential Equation (ODE) method to analyse ATD( ) and to optimize the choice of the parameter and the learning stepsize, and we introduce ATD, a new efficient temporal difference learning algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

We present for the first time an asymptotic convergence analysis of two-timescale stochastic approximation driven by controlled Markov noise. In particular, both the faster and slower recursions have non-additive Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both times...

متن کامل

From Q( ) to Average Q-learning: Efficient Implementation of an Asymptotic Approximation

Q( ) is a reinforcement learning algorithm that combines Q-learning and TD( ). Online implementations of Q( ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins’ Q( ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q( ) that appears to be a gain matrix variant of basic Qlearnin...

متن کامل

Preconditioned Temporal Difference Learning

LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new met...

متن کامل

A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal Difference Learning

The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desire...

متن کامل

Proximal Gradient Temporal Difference Learning Algorithms

In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to pri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000